Add bulkFetch and documentClassCounts API functions by stevevanhooser · Pull Request #55 · Waltham-Data-Science/NDI-python

stevevanhooser · 2026-04-20T23:35:21Z

Summary

Add two new document API functions to support efficient bulk document retrieval and document class histogram queries.

Key Changes

bulkFetch(): New function to synchronously fetch up to 500 documents by ID via POST /datasets/{datasetId}/documents/bulk-fetch
- Validates inputs: non-empty list, max 500 entries, each a 24-character hex string
- Returns list of document dicts with full data
- Silently omits non-existent, soft-deleted, or mismatched documents
- Intended for small subsets (e.g., from ndiquery results)
documentClassCounts(): New function to retrieve document class histogram via GET /datasets/{datasetId}/document-class-counts
- Returns flat histogram grouped by leaf data.document_class.class_name
- Includes fields: datasetId, totalDocuments, and classCounts mapping
- Missing/empty class names bucketed under 'unknown'
Input validation: Added regex pattern _HEX24 to validate 24-character hex document IDs
Comprehensive test coverage: Added 8 unit tests covering happy paths, validation errors, and edge cases
MATLAB bridge documentation: Updated sync metadata tracking both functions as synchronized with MATLAB main as of 2026-04-20

Implementation Details

Both functions use @_auto_client and @validate_call decorators for consistency with existing API wrappers
Input validation happens before API calls to fail fast on invalid inputs
Response handling gracefully handles missing documents field in bulkFetch by returning empty list
Follows existing patterns: delegates HTTP metadata to CloudClient, returns only the data payload

https://claude.ai/code/session_01Wv5mG4qAT66WtQ2NjMQP88

Mirrors two new commands added to the MATLAB +ndi/+cloud/+api/+documents namespace. MATLAB routes them through +implementation wrappers that normalize output style; the Python port uses CloudClient for the same role, so no +implementation mirror is needed. INTERFACE UPDATE: Added bulkFetch and documentClassCounts entries to src/ndi/cloud/api/ndi_matlab_python_bridge.yaml. - bulkFetch: POST /datasets/{datasetId}/documents/bulk-fetch; mirrors MATLAB input validation (non-empty, <= 500 entries, 24-char hex IDs) and returns the 'documents' array. - documentClassCounts: GET /datasets/{datasetId}/document-class-counts; returns the datasetId/totalDocuments/classCounts struct.

The cloud search API no longer exposes document_class.class_name as a directly searchable field path. Class filtering now has to go through the 'isa' operator, which also rolls up subclasses. This was causing test_ndiqueryAll_paginates to return zero documents against the live server. Only the two cloud ndiquery tests are affected. Inline document bodies (e.g. {"document_class": {"class_name": "..."}}) and local session.database_search calls continue to use the field directly since they are not cloud search structures.

Replaces the regex-on-document_class.class_name idiom with the semantic equivalent ndi_query.all(), which is a static factory for isa('base'). Matches the NDI-matlab ndi.query.all() convention and avoids relying on the soon-to-be-removed document_class field path.

claude added 3 commits April 20, 2026 23:33

stevevanhooser merged commit cd0346d into main Apr 21, 2026
5 checks passed

stevevanhooser deleted the claude/ndi-cloud-api-porting-wumxY branch April 21, 2026 00:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bulkFetch and documentClassCounts API functions#55

Add bulkFetch and documentClassCounts API functions#55
stevevanhooser merged 3 commits intomainfrom
claude/ndi-cloud-api-porting-wumxY

stevevanhooser commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

stevevanhooser commented Apr 20, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants